Dimensionality Reduction for Categorical Data
نویسندگان
چکیده
Categorical attributes are those that can take a discrete set of values, e.g., colours. This work is about compressing vectors over categorical to low-dimension vectors. The current hash-based methods do not provide any guarantee on the Hamming distances between compressed representations. Here we present FSketch create sketches for sparse data and an estimator estimate pairwise among uncompressed only from their sketches. We claim these be used in usual mining tasks place original without compromising quality task. For that, ensure also categorical, sparse, distance estimates reasonably precise. Both sketch construction estimation algorithms require just single-pass; furthermore, changes point incorporated into its efficient manner. compressibility depends upon how independent dimension -- making our algorithm attractive many real-life scenarios. Our claims backed by rigorous theoretical analysis properties supplemented extensive comparative evaluations with related some real-world datasets. show significantly faster, accuracy obtained using top standard unsupervised RMSE, clustering similarity search.
منابع مشابه
Dimensionality Reduction for Data Visualization
Dimensionality reduction is one of the basic operations in the toolbox of data-analysts and designers of machine learning and pattern recognition systems. Given a large set of measured variables but few observations, an obvious idea is to reduce the degrees of freedom in the measurements by representing them with a smaller set of more “condensed” variables. Another reason for reducing the dimen...
متن کاملData Reduction Method for Categorical Data Clustering
Categorical data clustering constitutes an important part of data mining; its relevance has recently drawn attention from several researchers. As a step in data mining, however, clustering encounters the problem of large amount of data to be processed. This article offers a solution for categorical clustering algorithms when working with high volumes of data by means of a method that summarizes...
متن کاملIntegrated dimensionality reduction technique for mixed-type data involving categorical values
An extension to the recent dimensionality-reduction technique t-SNE is proposed. The extension facilitates t-SNE to handle mixed-type datasets. Each attribute of the data is associated with a distance hierarchy which allows the distance between numeric values and between categorical values be measured in a unified manner. More importantly, domain knowledge regarding semantic distance between ca...
متن کاملDimensionality Reduction for Multispectral Skin Data
Principal Component Analysis ( PCA ), Locally Linear Embedding ( LLE ) and Isomap techniques can be used to process and analyze high-dimensional data domains. These methodologies create low-dimensional embeddings of the original data which are easier to work with than the initial high-dimensional data. The goal of this report is to show how the above methods can be applied to very high-dimensio...
متن کاملDimensionality reduction for financial data visualization
Various data mining methods are used for examining large financial data sets to uncover hidden and useful information. Ability to access big data sources raises new challenges related with capabilities to handle such enormous amounts of data. This research focuses on big financial data visualization that is based on dimensionality reduction methods. We use data set that contains financial ratio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering
سال: 2023
ISSN: ['1558-2191', '1041-4347', '2326-3865']
DOI: https://doi.org/10.1109/tkde.2021.3132373